library(fpp2)
library(zoo)
library(plotly)
Use the help function to explore what the series gold, woolyrnq and gas represent.
autoplot() to plot each of these in separate plots.data(gold)
autoplot(gold)
data(woolyrnq)
autoplot(woolyrnq)
data(gas)
autoplot(gas)
frequency() function.frequency(gold)
## [1] 1
frequency(woolyrnq)
## [1] 4
frequency(gas)
## [1] 12
which.max() to spot the outlier in the gold series. Which observation was it?#position of the outlier
which.max(gold)
## [1] 770
#value of the outlier
gold[which.max(gold)]
## [1] 593.7
Download the file tute1.csv from the (http://otexts.com/fpp2/extrafiles/tute1.csv)[book website], open it in Excel (or some other spreadsheet application), and review its contents. You should find four columns of information. Columns B through D each contain a quarterly series, labelled Sales, AdBudget and GDP. Sales contains the quarterly sales for a small company over the period 1981-2005. AdBudget is the advertising budget and GDP is the gross domestic product. All series have been adjusted for inflation.
tute1 <- read.csv("http://otexts.com/fpp2/extrafiles/tute1.csv", header=TRUE)
head(tute1)
## X Sales AdBudget GDP
## 1 Mar-81 1020.2 659.2 251.8
## 2 Jun-81 889.2 589.0 290.9
## 3 Sep-81 795.0 512.5 290.8
## 4 Dec-81 1003.9 614.1 292.4
## 5 Mar-82 1057.7 647.2 279.1
## 6 Jun-82 944.4 602.0 254.0
mytimeseries <- ts(tute1[,-1], start=1981, frequency=4)
#(The [,-1] removes the first column which contains the quarters as we don’t need them now.)
autoplot(mytimeseries, facets=TRUE)
#Check what happens when you don’t include facets=TRUE.
Download some monthly Australian retail data from the (https://otexts.com/fpp2/extrafiles/retail.xlsx)[book website]. These represent retail sales in various categories for different Australian states, and are stored in a MS-Excel file.
temp = tempfile(fileext = ".xlsx")
dataURL <- "https://otexts.com/fpp2/extrafiles/retail.xlsx"
download.file(dataURL, destfile=temp, mode='wb')
retaildata <- readxl::read_excel(temp, skip=1)
#The second argument (skip=1) is required because the Excel sheet has two header rows.
head(retaildata)
## # A tibble: 6 x 190
## `Series ID` A3349335T A3349627V A3349338X A3349398A A3349468W
## <dttm> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 1982-04-01 00:00:00 303. 41.7 63.9 409. 65.8
## 2 1982-05-01 00:00:00 298. 43.1 64 405. 65.8
## 3 1982-06-01 00:00:00 298 40.3 62.7 401 62.3
## 4 1982-07-01 00:00:00 308. 40.9 65.6 414. 68.2
## 5 1982-08-01 00:00:00 299. 42.1 62.6 404. 66
## 6 1982-09-01 00:00:00 305. 42 64.4 412. 62.3
## # ... with 184 more variables: A3349336V <dbl>, A3349337W <dbl>,
## # A3349397X <dbl>, A3349399C <dbl>, A3349874C <dbl>, A3349871W <dbl>,
## # A3349790V <dbl>, A3349556W <dbl>, A3349791W <dbl>, A3349401C <dbl>,
## # A3349873A <dbl>, A3349872X <dbl>, A3349709X <dbl>, A3349792X <dbl>,
## # A3349789K <dbl>, A3349555V <dbl>, A3349565X <dbl>, A3349414R <dbl>,
## # A3349799R <dbl>, A3349642T <dbl>, A3349413L <dbl>, A3349564W <dbl>,
## # A3349416V <dbl>, A3349643V <dbl>, A3349483V <dbl>, A3349722T <dbl>,
## # A3349727C <dbl>, A3349641R <dbl>, A3349639C <dbl>, A3349415T <dbl>,
## # A3349349F <dbl>, A3349563V <dbl>, A3349350R <dbl>, A3349640L <dbl>,
## # A3349566A <dbl>, A3349417W <dbl>, A3349352V <dbl>, A3349882C <dbl>,
## # A3349561R <dbl>, A3349883F <dbl>, A3349721R <dbl>, A3349478A <dbl>,
## # A3349637X <dbl>, A3349479C <dbl>, A3349797K <dbl>, A3349477X <dbl>,
## # A3349719C <dbl>, A3349884J <dbl>, A3349562T <dbl>, A3349348C <dbl>,
## # A3349480L <dbl>, A3349476W <dbl>, A3349881A <dbl>, A3349410F <dbl>,
## # A3349481R <dbl>, A3349718A <dbl>, A3349411J <dbl>, A3349638A <dbl>,
## # A3349654A <dbl>, A3349499L <dbl>, A3349902A <dbl>, A3349432V <dbl>,
## # A3349656F <dbl>, A3349361W <dbl>, A3349501L <dbl>, A3349503T <dbl>,
## # A3349360V <dbl>, A3349903C <dbl>, A3349905J <dbl>, A3349658K <dbl>,
## # A3349575C <dbl>, A3349428C <dbl>, A3349500K <dbl>, A3349577J <dbl>,
## # A3349433W <dbl>, A3349576F <dbl>, A3349574A <dbl>, A3349816F <dbl>,
## # A3349815C <dbl>, A3349744F <dbl>, A3349823C <dbl>, A3349508C <dbl>,
## # A3349742A <dbl>, A3349661X <dbl>, A3349660W <dbl>, A3349909T <dbl>,
## # A3349824F <dbl>, A3349507A <dbl>, A3349580W <dbl>, A3349825J <dbl>,
## # A3349434X <dbl>, A3349822A <dbl>, A3349821X <dbl>, A3349581X <dbl>,
## # A3349908R <dbl>, A3349743C <dbl>, A3349910A <dbl>, A3349435A <dbl>,
## # A3349365F <dbl>, A3349746K <dbl>, ...
myts <- ts(retaildata[,"A3349396W"], frequency=12, start=c(1982,4))
autoplot(), ggseasonplot(), ggsubseriesplot(), gglagplot(), ggAcf()Can you spot any seasonality, cyclicity and trend? What do you learn about the series?
The time plot shows: * Annual seasonality, with a peak in December and a low in February * A consistent upwards trend * No evidence of cyclic behavior
theme_set(theme_light(base_size = 12))
p <- autoplot(myts) + ggtitle("Retail Data A3349396W") + xlab("Year")
ggplotly(p)
The seasonal plot confirms the seasonality and upwards trend found in the time plot. It also shows that the upwards trend has been accelerating.
ggseasonplot(myts, polar = TRUE) + ggtitle("Retail Data A3349396W") + xlab("Year")
This subseries plot confirms the upwards trend and the peak that we see in Decmeber and the trough that occurs in February. Other than this, the subseries plot for this time series des not tell us much.
ggsubseriesplot(myts) + ggtitle("Retail Data A3349396W") + xlab("Year")
The lagplot has the strongest linear relationship at lag 12, confirming that the data has annual seasonality. However, it also shows a positive linear relationship for most of the lag plots. This is because the time series is nearly always increasing.
gglagplot(myts) + ggtitle("Retail Data A3349396W") + xlab("Year")
The autocorrelation plot has r12 slightly higher than the other lags, due to the annual seasonal pattern in the data. All lags are higher than the dashed blue lines, which indicates the correlations are significantly different then 0.
ggAcf(myts) + ggtitle("Retail Data A3349396W") + xlab("Year")
Each of these plots has confirmed that the series has an annual seasonality and an upwards trend.
Use the following graphics functions: autoplot(), ggseasonplot(), ggsubseriesplot(), gglagplot(), ggAcf() and explore features from the following time series: hsales, usdeaths, bricksq, sunspotarea, gasoline.
Can you spot any seasonality, cyclicity and trend? What do you learn about the series?
These plots have indicated that the series has an annual seasonality and is strongly dependent on the previous month of data. There is also some evidence of cyclical behavior every 6-9 years.
The time plot shows: * Annual seasonality, with a peak in February and a trough in December * No evidence of a trend * Cyclic behaviour every 6-9 years
theme_set(theme_light(base_size = 12))
p <- autoplot(hsales) + ggtitle("hsales") + xlab("Year")
ggplotly(p)
The seasonal plot confirms the seasonality. It also reveals a peak in March and trough in December.
ggseasonplot(hsales, polar = FALSE) + ggtitle("hsales") + xlab("Year")
This subseries plot shows the seasonal behavior of the time series (increasing from January - March, decreasing from March - December).
ggsubseriesplot(hsales) + ggtitle("hsales") + xlab("Year")
The lagplot has the strongest linear relationship at lag 1 and the linear relationship continues to weaken as the lag continues until lag 12, when it becomes somewhat linear. This reflects that this time series has some annual seasonality, but mainly depends on the previous month of data.
gglagplot(hsales) + ggtitle("hsales") + xlab("Year")
The autocorrelation plot has r12 higher than the other lags, due to the annual seasonal pattern in the data. Even higher is r1, which confirms that the previous month in the time series is indicative of the next. All lags except 16 - 22 are higher than the dashed blue lines, which indicates the correlations are significantly different then 0.
ggAcf(hsales) + ggtitle("hsales") + xlab("Year")
The predicatability of an event or quantity depends on several factors:
Forecasting methods depend on what data are available and the predicability of the quantity to be forecast. Methods include: 1. Naive method - Using the most recent observation as a forecast 2. Judgemental forecasting - For when historic data is not available. More accurate when the forcaster has important domain knowledge and more up-to-date data.
Forecasting Predicting the future as accurately as possible, using historical data and knowledge of future events. May be short-term (ex - scheduling, demand), medium-term (ex- resource requirements), or long-term (ex - strategic planning).
Goals Events that we would like to happen
Planning A response to forecast and goals
ts objectsTime series can be stored as a ts object in R.
autoplot in R to visualizeSeasonal plots show data plotted against individual seasons to help better identify underlying seasonal patterns and see where the pattern changes. Option - set polar = TRUE to see plot on polar coordinates.
Plot where the data for each season are collected in separate mini time plots.
Used to see relatonships between predictor variables. * Facet by a feature * Find correlation between two time series * Correlation matrix
Note - The kth lag is the time period that happened k time periods before time i. Helpful in identifying seasonality.
Autocorrelation measures the linear relationship between lagged values of a time series. * When data have a trend, autocorrelations for small lags tend to be large and positive * When data are seasonal, autocorrelations will be larger at multiples of the seasonal frequency * Wehn both, autocorrelations will have a combination of these
White noise - time series that show no autocorrelation.